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Abstract. Using a backtesting framework, we develop a new estimator for the tail index 
of a distribution in the Frechet domain of attraction. This estimator is equivalent to taking 
a [/-statistic over a Hill estimator with two order statistics. The estimator presents multiple 
advantages over the Hill estimator. In particular, it has asymptotically C°° sample paths as 
p<| a function of the threshold k, making it considerably more stable than the Hill estimator. 

The estimator also admits a simple and intuitive threshold selection heuristic that does not 
require fitting a second-order model. 
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1. Introduction 

Researchers in multiple fields face a growing need to understand the tails of probability 
distributions, and extreme value theory presents tools which, under certain regularity as- 
sumptions, let us build simple yet powerful models for these tails. In the case of heavy tailed 
distributions, the setting of extreme value theory is as follows: Suppose our data is drawn 
from a distribution F, and assume that there is a constant 7 > and some slowly varying 
_ . function L such that 

L(a ) 

vn (1) 1 - Fix) = L(x) ■ x~y, with lim = 1 for all a > 0. 

x->oc L[X) 

Then, F is in what is called the Frechet domain of attraction]^] If F satisfies this property 
(which most commonly used heavy-tailed distributions do), extreme value theory provides 
an elegant and concise description of the asymptotic properties of sample maxima of F. The 
only challenge is that this description relies on knowledge of the parameter 7, called the 
tail index of the distribution F. And, unfortunately, estimating 7 from data is not always 
. ^ straightforward. 

^ The literature on tail index estimation is quite extensive. One of the oldest and most widely 



used estimators is due to Hill 1975| , who suggests estimating 7 with a simple functional of 



the top k + 1 order statistics of the empirical distribution: 

k—l 

(2) ln:= l^o. 



k 

r- 



n—j,n 



x„ 



-k,n 



Hill showed that jh converges in probability to 7 > 0, provided the threshold sequence k(n) 
is an intermediate sequence that grows to infinity slower than the sample size n. Hill's idea 
of using a functional of extreme and intermediate order statistics to estimate 7 has received 
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1 Extreme value theoretic methods are often discussed in the context of a larger family of distributions, 
characterized by a tail index 7 6 1. In this paper, however, we restrict ourselves to the Frechet or heavy- 
tailed case with 7 > 0. 
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considerable attention. Csorgo et al. 1985 suggest ways to adaptively weight the order 
statistics, while Dekkers et al. 1989 modify Hill's estimator so that it is also consistent for 
negative 7. More recently, Feuerverger and Hall 1999 , Gomes et al. 2008 , and others have 



worked on eliminating the asymptotic bias of the Hill estimator. 

Nonetheless, tail index estimation remains quite challenging, especially for smaller sam- 
ples on the order of a few hundred to a thousand points. Of course, many difficulties are 
inherent to the subject matter: Only a small fraction of any sample will be inside the tail of 
the underlying distribution, and so even large samples may contain very little information 
relevant to inference about this tail. 

Other challenges, however, seem to arise from specifics of popular estimators. All esti- 
mators for 7 require choosing a threshold at which the tail area of the distribution begins. 
Ideally, specifying a good threshold should be easy, and the estimate 7 should not be sensi- 
tive to small changes in the threshold. Unfortunately, most commonly used estimators for 7 
do not reach this ideal. In the case of the Hill estimator — where the parameter k from ^ 
stands in for the threshold — the choice is far from innocuous: 

• Inadequate choice of k can lead to large expected error. Small values of k have high 
variance, while large values of k usually have high bias. There is often an intermediate 
region for k where the estimator has fairly small expected error, but it is not always 
easy to find this region. 

• The Hill estimator is extremely sensitive to small changes in k, even asymptotically: 



Mason and Turova 1994 show that the Hill estimator process converges in law to a 



modified Brownian motion. Thus, even within the 'good' region with low expected 
error, a minute change in k can impact the conclusions to be drawn from the model. 

The problem of choosing the threshold k has been discussed, among o thers, by |Beirlant 



et al. 2002 , Danielsson et al. 2001 , Drees and Kaufmann 1998 , and Guillou and Hall 



2001]. Most existing methods rely on fairly complicated auxiliary models: All but the last 
of the cited ones require either implicitly or explicitly fitting a difficult-to-fit second-order 
convergence parameter. As the method due to Guillou and Hall does not require fitting 
secondary parameters, we use it as our main benchmark in simulation studies. The problem 



of excessive oscillation of the Hill estimator has been discussed by Resnick and Starica 1997 



who recommend smoothing the Hill estimator by integrating it over a moving window. We 
are not aware of any guidance on how to automatically select k for this smoothed Hill 
estimator. 

In this paper, we present a new estimator for 7 which greatly simplifies the problem of 
threshold selection. Our estimator is based on a backtesting framework. It is well known 
that sample maxima from a distribution F satisfying ([!]) have the following property: If 
Ai, X n are drawn independently from F, then as n goes to infinity, for some constants 
a > 0, b e R, 

max{A!, X n } 



G 1 (ax 



b), 



where G 7 (x 
Noting this, 

(3) 



L(n) ■ rO 

depends only on 7 and L{n) is an appropriately chosen slowly varying function, 
we may suspect that when F has positive support, 

lim n - (E[logmax{Ai, X n }] - E[logmax{Xi, A n _i}]) =7. 



In Theorem 3.3 we show that this relation in fact holds under very mild conditions on F 
near 0. 
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Our estimator follows directly from this formula. We first estimate the quantities 

Eflogmax^,...,^}] 

by subsampling our data without replacement, and then use ^ to obtain an estimate for 7. 
Since this estimator operates by computing the average log maxima of random blocks, we 
call it the Random Block Maxima (RBM) estimator. 

Our estimator behaves much like the Hill estimator; however, it addresses threshold selec- 
tion much more naturally than the latter: 

• The RBM estimator has asymptotically smooth sample paths as a function of k, and, 
even in modestly sized samples, does not suffer from small-scale instability in k. 

• Thanks to its smoothness properties, the RBM estimator admits a simple and in- 
tuitive heuristic for threshold selection that does not require fitting a second-order 
model. 

The estimator relates to backtesting in the sense that it tells us how much sample maxima 



would have increased with growing sample sizes in the past. For example, Petty et al. 2012 
ask how much the value of the best deal seen by a venture capital firm might increase if 
the firm managed to expand the number of deals it evaluates by 10%. In this case, the 
quantity computed by the RBM estimator corresponds directly to the average increase in 
the log-value of the best sampled deal on permuted historical data. 

The RBM estimator can be understood as belonging to two different frameworks of tail 
index estimation. The block maxima approach, which was often used in the early days of 
extreme value theory, aims to directly fit the distribution of fixed (e.g. yearly) blocks of data. 
In this light, the RBM estimator can be seen as a randomized method of moments estimator 
in the block maxima framework. Our estimator, however, can also be seen as an outgrowth 
of the more modern tail estimation paradigm started by the Hill estimator: As we will show, 
the RBM estimator can be constructed by taking a [/-statistic over a Hill estimator with 
two order statistics. In other words, once we start subsampling the data, the block maxima 
and Hill estimation frameworks merge and lead to the RBM estimator. 

In the next section, we outline how to use the RBM estimator, and apply it to a variety 
of datasets. After that, we study the theoretical properties of the estimator. We close with 
a simulation study which shows that, in terms of mean squared error (MSE), the RBM 
estimator is competitive with state-of-the-art threshold selection rules for the Hill estimator. 

As the examples and the simulation study should make clear, the main advantage of the 
RBM estimator is not that it beats the state-of-the-art in tail index estimation by having 
low MSE. Rather, its strength lies in its stability and ease of use. Practitioners using the 
RBM estimator can get close to optimal estimates for 7 by using an estimator 7 (A;) that 
is smooth in the tuning parameter k. We have already emphasized that this smoothness 
facilitates threshold selection, but the advantages do not stop there: 

• The RBM estimator is stable enough in k that we can visually inspect the quality of 
the extreme value theoretic model and look for abnormal patterns that may indicate 
a failure of modeling assumptions by simply examining a plot of 7 against k. In 
comparison, the corresponding curve for the Hill estimator is so noisy that it can be 
difficult to pick out any meaningful patterns with the naked eye. 

• The smooth relationship between 7 and k allows us to use labeled training data to 
choose k by supervised risk minimization - e.g. by running RBM on multiple datasets 
of the same size as our dataset of interest and with known 7, and then picking k with 
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the lowest prediction error. With the Hill estimator, the noise level is high enough 
that selection bias can easily overwhelm any true signal; however, with the RBM 
estimator the number of local minima to choose from is small and so the risk of 
problems related to selection bias is greatly reduced. 
• With the RBM estimator, a small change in k will usually not produce a large change 
in 7, and so it is more difficult for a marginally honest experimentalist to tune his 
choice of k in such a way as to get the value of 7 he wants. Thus, in controversial 
situations, the RBM estimator may allow for less experimental bias than the Hill 
estimator. 

Finally when paired with our threshold selection method, the RBM estimator allows us to 
get a point estimate for 7 without having to fit a second-order model and without having 



to resort to manual threshold selection (for example, Coles 2001 recommends manually 
examining "a mean residual life plot" to select a threshold when estimating 7 by maximum 
likelihood). 

In other words, without compromising quality, our RBM estimator is easier to use and 
gives more stable estimates for 7 than the Hill estimator, which is one of the most widely 
used tools for estimating the tail index of a heavy-tailed distribution. 

2. Random Block Maxima 
As described in ([3]), the RBM estimator for a given subsample size s is defined by 

(4) */ RBM (s) = s-(M(s)-M(s-l)), 

where M(s) is the average log maximum of a subsample of size s drawn without replacement 
from the full sample of size n: 

(5) M(s)= ( U ) nua{log*iJ. 

^ ' ii<...<i s h 

Note that since we are interested in the behavior of sample maxima, we need to use resam- 
pling without replacement instead of with replacement. Otherwise, the presence of duplicate 
elements in our subsamples would bias our estimates M(s) downwards. 

To facilitate comparison between the Hill and RBM estimators, we do not parametrize 
our estimator directly in terms of the subsample size s, but use 

2n 

6 k = — , 

s 

which corresponds roughly to the degrees of freedom in the data used by the RBM estimator. 

The RBM estimator has high variance for small k and potentially high bias for large k. 
More precisely, as shown in Theorem |3.3[ the estimator has asymptotic variance 

lim k(n) V&r[j RBM (k(n))) = 7 2 , 

n— >oo 

for any intermediate sequence k(n), just like the Hill estimator. Asymptotic bias increases 
with A; at a rate that depends on second-order parameters. 

It is useful to plot jRBM(k) against k, which gives us an analog of a Hill plot. We have 
found such plots to be most informative when we plot k on a log scale rather than on a linear 



scale, as recommended by Drees et al. 2000 . Once we have computed Trbm(^) at multiple 



k, the problem becomes to choose which threshold kopr to use for estimating 7. A good 
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choice of threshold k should aim to simultaneously keep the bias and variance components 
small. 

As we show in section [4], our estimator Jrbm converges weakly to a C°° limiting process. In 
practice, Jrbm is smooth enough as a function of k that we can reliably estimate its derivative 
in finite samples. This enables a particularly simple method for selecting a threshold kopr 
at which to report 7. 

We start by computing Jrbm for subsample sizes s = n, n — 1, 2. By(|6|, these choices 
of s correspond to k- values k\ < k 2 < ... < k n _i. We then pick k using 

(7) k OPT - argmm m < f ]og ^ _ ]og ^ J + 



Roughly speaking, this heuristic aims to minimize the square of the derivative 

w~, — r lRBM[k) 
ologk 

subject to a penalty term that decays as 1/k. As argued in section 5J our choice of k OPT 
aims to minimize possible bias in an empirical Bayes sense. We note that this threshold 
selection procedure is dependent on the smoothness properties of Jrbm- Attempting to use 
the same method with the Hill estimator jh would not lead to good results, since Jb is n °t 
asymptotically differentiable function of k. 

2.1. Examples. We showcase the RBM estimator by first applying it to two simulated 
datasets, and then using it to estimate the left hand tail index of daily stock market returns. 
The goal of these examples is to show how the RBM estimator can be used on real data; a 
more rigorous simulation study is given in section [6j 

We compare the RBM estimator to both the Hill estimator and the smoothed Hill estimator 



(smooHill) proposed by Resnick and Starica 1997 . There exist various heuristics for how 



wide a smoothing window to use for the smooHill estimator. We follow Resnick 2007 and 
avera ge the Hill estimator on (k, 2k] for each k. For the Hill estimator, we use the method 
from 



Guillou and Hall 



2001 



to automatically select k, while for RBM we use kopr from 

0- 

We begin by applying all three estimators first to 2000 datapoints drawn independently 
from a Student-t distribution with 4 degrees of freedom (7 = 0.25), and then to 500 data- 
points from a Frechet distribution with a shape parameter of 2 (7 = 0.5). In the case of the 
Student-t distribution, we discarded all negative datapoints (since all considered estimators 
involve taking logs of the datapoints), giving us an effective sample size of 992. Our results 



are given in Figure 2.1 



We observe that the RBM estimator oscillates much less than the Hill estimator or even the 
smooHill estimator (which has asymptotically C 1 sample paths whereas the RBM estimator 
is asymptotically smooth). The instability of the Hill estimator is not benign: Around the 
selected threshold, a small change in k can shift the confidence interval for the estimator by 
a full standard deviation and potentially change conclusions drawn from the model. Thus, 
although the estimates given by the RBM estimator at the selected thresholds are not more 
accurate than those given by either the Hill or the smooHill at the same thresholds, they are 
much less ambiguous. This should be quite useful in applications, since the less ambiguous 
the answers given by an estimator are, the easier it is to evaluate convergence, and the less 
room there is for data dredging or other types of confusion. 



6 



STEFAN WAGER 







jl 
II 

7 




/ 


f 






-' V 
■ I'y 
















— RBM 






Hill 






smooHill 




logK 



logK 



(a) N — 2000 points drawn from a Student-i dis- (b) N = 500 points drawn from a Frechet distri- 
tribution with 4 df. bution with \ = 1. 

Figure 1. Comparison of the RBM, Hill, and smooHill estimators on simu- 
lated data. The true value of 7 is shown by a horizontal line, and the error 
bars are 1 standard deviation wide. 



Next, we ran a similar comparison on 5 years of daily losses for the Dow Jones index. The 
dataset is described in Coles 2001 , and is available online through the R package ismev. We 
only considered the 577 days on which the index lost value. Results are displayed in Figure 

m ' 

Around the selected threshold, the RBM estimator is again substantially smoother than 
the Hill. We obtain estimates kopr — 33 and a 95% confidence interval of 7 = 0.32 ± 0.11, 



while the Hill estimator paired with the threshold selection method from Guillou and Hall 
0.35 ±0.09. Both estimates are fairly close to the value 7 



2001 



Coles 



gives 7 = U.3o±U.uy. Tiotn estimates are lairly close to tne value 7 = 0.29 obtained by 
using a maximum likelihood method. Coles, however, has a significantly wider 



2001 



confidence interval for this estimate (±0.51), due in part to his use of weaker distributional 
assumptions that also allow for negative values of 7. Note that the confidence intervals for 
the Hill and RBM estimators may be somewhat optimistic here, as they assume independence 
of the data. 

Finally we highlight a few cases where the RBM estimator as described here can fail, 
and show how to avoid these cases. First, the RBM estimator is somewhat computationally 
intensive. Our implementation can comfortably handle cases where n ranges in the low 
thousands; however, it becomes painfully slow when n approaches hundreds of thousands^] 
An easy way to avoid this problem without losing much information is to throw out all 
but the largest M datapoints (we usually take M = 2'000 or lO'OOO). This speeds up the 



On our machine, we can run RBM with n = l'OOO in about 0.5 seconds and with n = lO'OOO in 
roughly 10 seconds. When n — lOO'OOO, the program runs in 5 minutes. We made reasonable attempts to 
optimize our code, but did not use any more sophisticated techniques like importing C libraries into R. The 
computational complexity of running the RBM estimator for all k is 0(n 2 f(n)), where fin) is the complexity 
of the procedure used to compute n!. 
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Figure 2. Comparison of RBM, Hill and smooHill estimators on Dow Jones 
daily negative returns. The error bars are 1 standard deviation wide. 



algorithm a lot, and does not cost much in terms of accuracy because most of the information 
relevant to estimating 7 is in the largest datapoints anyways. 

Second, our threshold selection heuristic may fail if the data given to the RBM estimator is 
predominantly not from the tail of the distribution; this issue is discussed further in section 
[5j Again, a solution to this problem is to filter our data; in this case, we may want to throw 
out all the data that does not appear to be in the tail area we are trying to model. 



3. ASYMPTOTICS OF RANDOM BLOCK MAXIMA 

We now move to theoretical results. The limiting distribution of the RBM estimator can 
largely be derived from the theory of [/-statistics. A [/-statistic is a multi-parameter gener- 
alization of a sample mean: Given data X\, X n and a symmetric s-parameter function /, 
the [/-statistic over / is defined as 

(8) U n (X u ...,X n ):= ( n ) 1 /(*')■ 

W {IC{l,...,n}:\I\=s} 
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Such statistics have many desirable regularity properties. In particular, Hoeffding 1948 
showed that when the underlying function / is held fixed, [/-statistics are asymptotically 
normal with variance decaying as 1/n. 

As we have already stated earlier, our estimator RBM k ^ n : = 7_rbm(^) given in Q can 
be described as a [/-statistic over the Hill estimator. More precisely, for positive random 
variables Xi,...,X s with s th order statistics X\ >s < ... < X SyS , let H[ be the first Hill 
estimator on s datapoints 



(9) H { 1 S) (X 1 , ...,X S ) := logX SiS - logX s 

We can then write RBM k ^ n as a [/-statistic over H[ . The proof of the following lemma is 
given in the Appendix. 

Lemma 3.1. Let Xi,...,X n be positive random variables with n th order statistics A ln < 
... < X n)fl . Then, the RBM estimator given in Q is equivalent to 



-l,s- 



RBM ktn =( H ) Yl H[ s \X h ,...,X is 

^ ' h<...<i a 



where k satisfies the relation s = \2n/k\. 

Expressing RBM k ^ n as a [/-statistic enables us to leverage the extensive literature on the 
topic. Our problem, however, does not quite fall into the classical scope of [/-statistics. Most 
of the literature assumes that the function / in pj) is fixed as n grows. But, in our case, 

(s) 

the functions H{ take a number of parameters that increases with n. Such a [/-statistic 
is called an infinite order [/-statistic. Although (as shown below) the classical asymptotic 
distributional results for [/-statistics still hold in our case, the infinite order nature of the 
problem requires some additional work. 

A common strategy for showing the asymptotic normality of a sequence of statistics {U n } 
is by approximating the U n by their Hajek projections U n . Suppose Xi,...,X n are drawn 
from some known distribution, and let U n be an n-parameter function. We then define its 
Hajek projection U n as 

n 

(10) U n := E [U n ] + E Pn ~ E [U n ] \Xi] . 

i=i 

The advantage of studying such projections is that, when the X{ are iid, U n is a sum of 
independent random variables to which we can apply the central limit theorem. 

When U n is a [/-statistic, U n converges to U n in mean square under fairly general condi- 
tions. 

Lemma 3.2. Let X l5 X 2 , ... be independent and identically distributed random variables, and 
let s(n) be a sequence such that s(n) < n for all n. Moreover, let be a sequence of real- 
valued s(n) -parameter functions that are symmetric in their arguments, and let there be a 
constant C such that 

Var^Xx,...,^,))] <C 
for all n. Then, taking U n as a U -statistic over g^ 

u *=(.U)~ l E 9Wp0) ' 

{IC{l,...,n}:\I\=s(n)} 
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we find that 



E 





= 




(u n -& n y 









where U n is defined as in (10). 

This lemma, whose proof is given in the Appendix, follows from the Efron-Stein ANOVA 
decomposition. We are now ready to prove our main result. As is common in extreme value 
theory, our result relies on a second-order convergence criterion. For an overview of this 



second-order condition, see e.g. de Haan and Ferreira 2006 



Theorem 3.3. Let Xi, ...,X n be drawn iid from a distribution F satisfying the second-order 
condition 

U(tx) <y 

T777V — % T p — 1 

(11) lim m A , . = x<- , Va;>0 

V ' t^oo A(t) p 

for some 7 > 0, p < 0, and a function A(t) — > with constant sign. Here, U(t) is the 
inverse quantile function U(t) = inf{x : izWx) — fy- Moreover, suppose that F satisfies the 
technical condition 

(12) limF(x) ■ = for some (3 > 0, 

x— >o 



and let RBMk.n be the RBM estimator as described in Lemma 3.1 



(13) 



If k(n) is an intermediate sequence with k(n) — > 00 and k(n)/n — > such that 

n 



lim y/k(n)A 



k{n) 



A for some A 6 R, 



then, for any a > 0, RBM a M n \ n is asymptotically normal with 



^Hnj(RBM aHn)jn - 7) Af Ur(l - p) ( 



r 



where T is the gamma function. Moreover, for any a\, a m > 0, the estimators RBM ai y n \ n 
are aymptotically jointly normal with covariance 

27 2 



lim k(n) Cov[RBM aik(n)tn , RBM ajk{n): 



Oi + a 



3 



Proof. Let s(n) = [2n/k(n)\ be the subsample block size. By Lemma 8.1 

,2 



lim Var 



7 



Thus, by Lemma 3.2 , RBMk( n ),n converges in mean square to its Hajek projection RBMy n \. 
and 

( RBM k (n), n — RBMk(n),n ) = 0, 



lim k(n) ■ E 



because k(n) 



s(n) 



4/k(n) converges to zero. Moreover, as in (26), for any a > we 



can write this projection as 



(14) 



RBM 



ak(n),n 



s[n) 
an 



r (s(n)/a) 



X: 



1=1 



s n 



-IE 



(s(n)/a) 
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From Lemmas and we get that for any a, b > 0, 
E [#(»(»)A0] 



lim 

n— >oo 



7 



r(l — p), and 



E (#«n)/a)| Z \ )E ^(n)/6)| Xi 



A(2s(n)) 
lim s(n) Cov 

n— >oo 

the second of which implies, together with (14), that 

lim k(n) Cov RBM ak(n)tn , RBM bk(jl)yn 



r 



2 7 2 _ 
a + 6 



With these expressions in hand, we can conclude using the central limit theorem for triangular 
arrays and Slutsky's lemma that RBMk( n ),n has the stated asymptotic distribution. □ 



The technical condition ( 12 ) is very weak, and can in practice be ignored. In an extreme 



value theoretic setup we usually care about very large values, whereas this condition only 
specifies the behavior of very small values. This condition trivially holds if F is supported 
on [e, oo) for some e > 0. 



We end this section by noting that, by Lemma 8.1, even when F does not satisfy the the 



second-order condition for some p < 0, or when the sequence k(n) does not satisfy (13), 

H (2n/ H n)) 

converges to 7 in expectation. Thus, by a slight modification of the proof 



of Theorem 3.3, we find that, given any distribution F with tail index 7 > 0, RBMu n ), r 



is consistent for 7 along any intermediate sequence k(n) provided F satisfies the technical 
condition (12). 



4. The RBM Process 
Our result from the previous section leads naturally to the definition of an RBM process. 



Under the conditions of Theorem 3.3 with some 7 > and p < 0, let k(n) be an intermediate 



sequence such that, for some finite A, 



lim \/ k(n)A 



11 



k{n) 



A. 



Then, writing 
(15) 



X n (t) = y/k{n) {RBM, 



tk(n),n 



7) 



our result in Theorem 3.3 implies that, for all ti,...,t m > 0, the X n (U) are asymptotically 
jointly normal with 

(16) 



U 
2 

27 2 



lim E [X n (U)\ = AT(1 - p) 

n— ¥00 

lim Cav[X n (ti\X n (tj)\ = + , 

n— >oo tj + tj 

These mean and covariance equations can be used to define a Gaussian process, which we 
call the RBM process. 

Definition 4.1. Given values 7 > and p < 0, let X(t) be the Gaussian process on IR^_ 
satisfying the mean and covariance relations (16). The RBM process R(t) is then defined 
by R( T ) = X (e T ) for rGl 
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We define the RBM process on a log scale since this allows us to write down its properties 
more cleanly. This should not be too surprising, since the estim ator as written in Q is 



essentially a derivative 9 J^l with s on a log scale. In a similar vein 



Drees et al. 



2000 



show 



that the Hill process is most naturally plotted with fcona log scale. 

The following lemma shows that the X n (t) do in fact converge in law to the process X(t). 
The proof is given in the Appendix. 



Lemma 4.2. Let X n (t) be defined as in (15) under the conditions of Theorem 3.3, and let 
X(t) be the auxiliary process from Definition J^.l with the appropriate 7 > and p < 0. 
Then, the X n (t) converge weakly to X(t) on compact intervals o/Rl under the Skorokhod 
topology on T>. 



Our RBM process is analogous to the Hill process as discussed in |Resnick and Starica 
These two processes, however, behave very differently. While the Hill process is 



1997 



equivalent to a modified Wiener process and so has continuous but non- different iable sample 
paths, the RBM process has smooth sample paths. 



Theorem 4.3. There exists a modification of the RBM process defined in 4jJ_ that has C°° 
sample paths on R with probability one. Moreover, for any r e R, R(r) and R'(t) have joint 
distribution 

R{t )\ ± K r ( — p)e~ pT f 1 \ 7 2 ( 1 -1/2 N 
R'(t) 



-p 



1 

-1/2 



1/2 



Proof. It is well known [e.g. Loeve 1948 that, in order for a continuous-time stochastic 
process to have an almost surely C°° modification, it is sufficient for the covariance function 
C(r 1 ,r 2 ) to be infinitely different iable along the diagonal s\ = s 2 . We thus immediately get 



2^ 



is smooth on 



The same 



the desired smoothness result, since Cov[R(ti), Rfa)] 
result tells us that, for any 1,1' G N, 

Cov [r«\t),R«'\t)] = Cov [R(u),R(v)} 

which gives us the stated covariance result. The joint normality of R and R' and the expec- 
tation result follow directly from (16). □ 



In light of these results, we should expect the RBM estimator to have fairly smooth sample 



paths even for finite n. This is consistent with our observation in section 2.1 that the RBM 
estimator oscillates much less than either the Hill or the smooHill estimators. 

5. Optimal Threshold Selection 

Selecting an optimal tuning parameter k for the Hill estimator is a classic problem in 
extreme value theory. Both the Hill and the RBM estimators have high variance at small 
k, and may be quite biased at high k. A successful choice of k thus hinges on adequately 
balancing the bias and variance terms. Although the tuning parameter k is integrated fairly 
differently in the Hill and RBM estimators, a given choice of k has very similar effects on both 
estimators, and so our threshold selection heuristic should be read in light of the literature 
on optimal threshold selection for the Hill estimator. 

Most approaches to selecting k require implicitly or explicitly estimating the second-order 

[2001] 



parameter p. 



Danielsson et al. 



and Hall 1990 suggest using various sub-sample 
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bootstraps to estimate the MSE-minimizing threshold in smaller samples. Transforming this 



small sample threshold into a full sample threshold, however, requires knowledge of p. Hall 
[1990| recommends just using p = —1, while Danielsson et al. 2001 use auxiliary bootstraps 



to estimate the correct transformation coefficient. 



Drees and Kaufmann| (1998 suggest a procedure based on a law of the iterated logarithm, 

advocate plugging a consistent 



which also requires fitting p. Finally, Beirlant et al. 2002 



estimator for p into a formula for the optimal value of k given by Hall and Welsh 1985 



An alternative approach to threshold selection aims to stop just before the smallest value 
of k at which bias can be detected. Hill 1975 originally suggested picking k just before 



the log spacings between consecutive order statistics fail a test for exponentiality. This test, 
however, was shown by Hall and Welsh 1985 to be too lenient, and to produce estimates 



7 that were excessively biased. Guillou and Hall 2001 remedy this problem by developing 
a way to jointly test for bias among high-order log spacings. The approach advocated by 



Guillou and Hall 2001 does not require fitting p. This is a considerable benefit, since getting 



accurate estimates for p is not practical in many applications. 

We suggest a threshold selection heuristic for the RBM estimator that is similar in spirit 
to this second class of alternatives, in that it aims to select a threshold just before significant 
bias starts to appear. However, instead of stopping just before bias can be detected at a 
given significance level, we aim to minimize possible bias in an empirical Bayes sense. Note 
that the following derivation of our threshold selection procedure is only intended as an 
informal motivation; the main argument for this procedure is that it is simple, intuitive, and 
appears to work in practice. 

Consider the RBM process R(t) discussed in section |4j From Theorem 4.3 we know that, 
if E[R(t)} = b(t) is the bias at t, then 



(17) 



R'(t) 



2e* 



This suggests using R'(t) as a test statistic for the hypothesis b(t) = 0. 

A first approach to selecting the optimal threshold t would be to pick the first t at which we 
have to reject the null hypothesis b(t) = at some significance level a. While this approach 
works decently, especially when \p\ is large, we found that an alternative empirical Bayes 
approach works even better. 

Suppose that, for a fixed t, b(t) is considered random with a uniform (improper) prior on 



Then, using (17), we find that b{t) has a posterior distribution 

„2 



C[b(t)\R\t)] =J\f 



R'(t) 7 2 



p ' 2p 2 e t 



and so 

We then select 
(18) 



E[b 2 (t)\R'(t)] 



2R'(t) 2 + 1 2 e- t 



t OPT = argmin t E[6 2 (t) \R'(t)} 



argmin t R'{t) 2 + 



T_ 
2e*' 



We thus aim to select the value of t that gives us least cause to suspect bias, rather than 



the first t at which we must suspect bias. The heuristic given in ( 18 ) tends in practice to be 
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somewhat conservative, and selects thresholds t that have somewhat less bias and somewhat 
more variance than optimal from a MSE minimization point of view. Given, however, that 
the bias term is elusive whereas the variance term is easily estimated, such a tradeoff may 
not be so bad. 



Our optimization heuristic in (18) takes the form of an intuitive penalized optimization 



problem. Broadly speaking, the procedure tries to select a point t such that R' t is small, since 
low R' t equates to low bias. However, low values of t are plagued by high variability, and 
so we penalize small values of t. This procedure seems to mimic the strategy a practitioner 
might use in selecting k from a Hill plot, and so we may hope that, even when the second- 
order condition does not hold or large third order effects are present, this heuristic will still 
give reasonable recommended thresholds. 

We end this section on a note of caution: The relation ( fl7| only holds in the tail region of 
the distribution. Thus, if we let k grow large enough that the RBM estimator starts to use 
substantial amounts of non-tail data, our heuristic can fail badlyj^] One way to avoid such 
a problem is, as discussed in section 21, to pre-filter our data and to only give the RBM 
estimator datapoints that are in the tail area of the distribution. For example, we might 
only use points that are above the mode of a coarse histogram of the data. Thankfully, such 
filtering should not cost us much, as the right-hand tail is the only part of the distribution 
that contains information that is relevant for estimating 7. 



6. Simulation Study 

In this section, we run simulations to test our RBM estimator against three other estima- 
tors for 7. The benchmark estimators are all threshold selection rules for the Hill estimator, 
and are described in detail in Beirlant et al. 2004 . We compare 



Irbm'- Our RBM estimator, with threshold selection implemented as in (J7|) 



Ibdgs'- The plugin method from Beirlant et al. 2002 



"/dk'- The procedure based on a law of the iterated logarithm from Drees and Kauf- 
and 



mann 



1998 



^gh'- The diagnostic for bias from Guillou and Hall 2001 



The distributions used for testing are given below. These distributions were also used for 
a simulation study in Beirlant et al.| [2002] . 

• Frechet(2) with distribution F(x) = e~ x ~ 2 , 7 = 1/2, p = —1. We drew N = 200 
datapoints from this distribution. 

• Burr(l, 0.5, 2) with distribution 1 - (1 + y/x)~ 2 , 7 = 1, p = -1/2. We drew N = 500 
datapoints from this distribution. 

• Student-t(6) with 6 degrees of freedom, 7 = 1/6, p = —1/3. We drew N = 500 
datapoints from this distribution. 



• Log-Gamma(2, 1) with density f(x) = x 2 log(x), 7 = 1, p = 0. We drew N = 500 
datapoints from this distribution. 

Simulation results are given in Table [TJ All numbers were estimated using 4000 repli- 
cations. Non-positive datapoints arising with the Student-t distribution were discarded, as 
discussed in section 12. 1[ 



3 To witness such a failure, one can try applying the RBM estimator on lO'OOO datapoints drawn from 
a Student-i distribution with 2 degrees of freedom and a mean offset of +3. The heuristic from (18) will 
systematically pick a value of k that is much too large. 
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Table 1. Comparison of root mean squared error (RMSE) and bias for four 
estimators. Standard sampling errors (x!0~ 3 ) are indicated in parentheses. 



Distribution 


1RBM 


iBDGS 


1DK 


1GH 


Frechet 


RMSE 


0.116 (2) 


0.142 (4) 


0.087 (1) 


0.102 (1) 




Bias 


0.011 (2) 


-0.004 (2) 


0.035 (1) 


0.044 (1) 


Burr 


RMSE 


0.334 (3) 


0.442 (3) 


0.344 (3) 


0.382 (3) 




Bias 


0.129 (5) 


0.410 (3) 


0.261 (4) 


0.333 (3) 


Student-t 


RMSE 


0.112 (1) 


0.145 (1) 


0.149 (1) 


0.178 (1) 




Bias 


0.074 (1) 


0.130 (1) 


0.134 (1) 


0.168 (1) 


Log- Gamma 


RMSE 


0.293 (2) 


0.258 (3) 


0.327 (2) 


0.287 (2) 




Bias 


0.215 (3) 


0.182 (3) 


0.301 (2) 


0.238 (3) 



We see that the RBM estimator is overall competitive with the other tested estimators 
in terms of MSE: RBM performs particularly well for both the Burr and the Student-t, and 
behaves reasonably for the rest. We note in particular that Jrbm is substantially less biased 
than either j GH or j DK and somewhat less biased than Jbdgs f° r the surveyed distributions. 
At equal MSE, having low bias may be advantageous since, as discussed earlier, variance 
terms are often easier to estimate than bias terms which depend on second-order parameters, 
and since systematic bias across multiple experiments may be hard to detect. 



7. Conclusions 



In this paper, we presented a new estimator for the tail index of a distribution in the 
Frechet domain of attraction. The estimator arose from backtesting ideas, but can also be 
described as an infinite order [/-statistic taken over the Hill estimator. The main advantage 
of our RBM estimator in comparison with existing methods lies in its stability and ease of 
use. While most commonly used estimators are extremely sensitive to small changes in the 
tuning parameter k, the RBM estimator is stable with respect to k. And, while most other 
estimators require either manually choosing the threshold or fitting a complicated auxiliary 
model for k, the RBM framework admits a simple, intuitive, and largely automatic heuristic 
for threshold selection. Although the results proved in this paper are asymptotic, we saw in 
section |2.1| that the advantages of the RBM estimator are apparent in finite samples. 



More generally, this paper presents a new approach to constructing and finding the limiting 
distribution of tail index estimators. The asymptotic behavior of many classical estimators 



can be established using results from e.g. Drees 1998 on the convergence of tail empirical 



processes. In the present work, however, we take a different approach and study convergence 
using Hajek projections and infinite order [/-statistics. There are multiple opportunities to 
tackle further problems in extreme value theory using similar methods. In particular, it 
should be possible to construct a bias-corrected version of the RBM estimator by mirroring 
ideas from Gomes et al. 2008 , or to establish an RBM-type process which would permit 



estimation of a general tail index 7 £ 
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8. Appendix: Proofs 
In the following results, we use the notation U (t) for the inverse quantile function 



(19) 



U(t) = inf{x : 



1 - F(x) 



>*}■ 



It can be shown [e.g. de Haan and Ferreira, 2006, Section 1.2] that the distribution F has 
extreme value index 7 > if and only if U is a regularly varying function of index 7, i.e. 
lim^oo U(tx)/U(t) = x 1 for all x > 0. We also use [n] for the set {1, ...,n}. 

Lemma 8.1. Let X%, ...,X m be drawn iid from a distribution F of strictly positive support 



in 



with extreme value index 7 > 0. Then the first Hill estimator H[ from ^ converges 
distribution to an exponential random variable with mean 7. Moreover, if there is a constant 
(5 > such that 



lim Fix) ■ x 8 = 0, 



then all moments of H[ s ' converge to the the corresponding moments of the limiting random 
variable. In particular, 



lim E 



7 and lim Var 



7 2 - 



Proof. In terms of the inverse quantile function U(t) from (19), we can write Xj t = U(Yk), 
where the are drawn independently from a distribution with cdf Fy{y) = ^j- for y > 1. 
We write Yi m < ... < Y mm for the order statistics of the Yfc. 



Since U is a regularly varying function of index 7, Potter's inequality Potter, 1942 implies 
that, for any e > 0, there is a to such that, for all t, tx > to, 



(20) 



[l-e)x 



7— sgn[loga;]-£ 



< 



U(tx) 

w 



< (1 +e)x 



7+sgn[loga;]-E 



where sgn is the sign operator. Thus, since as in Lemma 8.4, limm^oo P[F m _i i?n < to] 
we conclude that 

U (Y mm j Y mTn 



U(Y m —l,m) ^m— l,m 

Now, we note that the logl^ have standard exponential distribution Exp. By Renyi repre- 



Renyi, 1953 , if Ei m < ... < E mm are order statistics of a standard exponential 



sentation 

distribution, the Ek im are jointly distributed as 

k 



(21) 



E, 



E 

1=1 



E 



m — I + 1 



, with El E* m ~ Exp. 
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In particular, E m , m — E m ^i >m is exponentially distributed, and is independent from E m _\, m . 
This implies our first claim: 



#i s) =lo£ 



U{Y n , 



U(Y r 



m—l,m) 



7 ■ log 



Y 



y 

1 m—l, 



Exp{^). 



Pm (t ) ■ E [(log[l - e] + (7 - e) ■ Ef] < E 



log 



To show convergence of the v th moment, we again use Potter's inequality, which implies that 
for any e > there is a to such that 

U(Y m 

— l,m) 

< p m (t ) ■ E [(log[l + e] + (7 + e) ■ Ef] , 

where E ~ Exp and p m {to) — P[^m-i,m > ^o]- We recall that p m (t ) — > 1, and so in order to 
obtain convergence of moments it suffices to show that 

U(Y m 



lim E 

m— >oo 



log 



\J (Yni—i.m) _ 



Y m —l,m ^ ^0 



0; 



this follows from the second part of Lemma 8.4, since the technical condition near holds 
by hypothesis. □ 



Lemma 8.2. Let Xi,...,X m be drawn from a distribution F satisfying the second-order 
condition 

n(t<r\ 

„x p - 1 



U(t) X 



lim 

t^oo A(t) 



X ' 



p 



for all x > 0, with some 7 > 0, p < and a positive or negative function A(t) with 
lim^oo A(t) = 0. Moreover, suppose there is a constant (5 > such that 



lim Fix) ■ x = 0. 

x— >0 



Then, writing Xi jjn < ... < X m ^ m for the order statistics of X, we have, for any a > 0, that 

E[logX mjm - logX m _i >m ] - 7 r(l - p) 



lim 

m— >oo 



A(am) 



Proof. As in the proof of Lemma 



we write Xk = U(Yj.) where the have cdf Fy(y) = — - 



for y > 1. Since A(t) — > 0, the stated second-order condition is equivalent to 

log U(tx) — log U(t) — 7 log(x) x p — 1 
t->oo A(t) p 

for all x > 0. By Drees |1998| , there exists a function A (t) ~ A(t) (and so without loss of 



generality Ao(t) = A(t)) such that for any e > 0, there is a to such that, for all t > to an -d 
x > 1, 



(22) 



log £7(tx) — log C/(t) — 7 log(x) x p — 1 



A(t) 



< ex 



For any r < 1 we find by Renyi representation (21) that 



E 



Y 



Y 



m— l.m 



\Ym—l,m — ^0 



e (r 1)x dx < 00, 
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and so, because p < 0, we find by plugging t = F m _i jm and tx = Y mjjn into (22) that for any 
5 > there is a to such that 

2 



(23) lim E 



log 



l,my \ 1 m — l,m 7 \ 1 m — l,m / 



^4(^TJ-l,m) 



i ^m— 1,771 — 



< 5. 



We now move to the case V^i_i, m < to- ^4(*) must be regularly varying [e.g. de Haan and 



Ferreira, 2006, Section 2.3] with index p, and so by Karamata representation we can assume 
without loss of generality that A(t) is continuous on [0, oo) and strictily positive or strictly 
negative; in particular, A(t) is then bounded away from for finite intervals. Thus, by 
Lemma 8.4, the expression on (23) now integrated over the set Y m _i jm < to converges to 0. 



From this we conclude that 



lim E 

771— »0O 



log 



U(Y m 



- T log(^f 



Y 

1 ri'i 



P 



0. 



Moreover, assuming without loss of generality that appropriate regularity conditions for A(t) 

p\ 2 1 



hold near t = 0, we can show along the lines of Lemma [8. 1| that 

2 

< oo, and lim E 



lim sup E 



A(Y 7 



m—l,m, 



A(m) 



A(Y r 



m— 1,777 I 



A{m) 



1 m—l, 



III 



0. 



Thus, using Cauchy-Schwarz, we establish that 



lim E 

771— >-0O 



log 



U {Ym,rri) 



U(Y m 



1 m 



A(m) 



Finally, by Renyi representation we can write 



1 m,m 1 m—l, 



m 



m 



m 



m 



1 - exp[-£ li?n ] 1 - exp[-E 2 , 
1 1 



Ei Ei + Ei 

where E\ and E2 are independent standard exponential and the Ek, m are exponential order 
statistics. Uniform integrability holds, and so 

,P / ! \p-\ 



lim 

m—^oo 



Elog 



U(Y„, 



— 7E log 



E1+E2 



A{m) 



E 



1 



E 1 +E 2 



P 



from which the desired conclusion follows by calculus and the fact that A(t) is regularly 
varying of index p. □ 

Lemma 8.3. Let X^ m < ... < X m ^ m be independent order statistics drawn from a distribution 
F with extreme value index 7 > ; satisfying 

lim Fix) ■ x~p = 0, for some (3 > 0. 
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Then, writing 
we have: 



ty m (X) = E[logX m , m - logX m _i jm |Xi = X), 



lim mVar[f m 
lim m Cov[\I/ m , \l/ c 



r 



and, more generally, 



r 



l + a 



for all a > 0. 



Proof. For convenience, write Wi = logXj. For Wi, ...,W m , and Wi, W m +i independent 
of each other, 

S m (w) : = E[W m+1:m+1 - W m>m+ i\Wi =w]- E[W m , m - W m - liTn ] 

= E[W m -i, m - W] W m -l,m <w< W m , m ] + E[w - 2W m>m + W m -i )Tn ; W m , m < w] 
= E[W m _i, m ; W m _i iTn <w]- 2E[W m , m ; W mj7n < w] 
+ w ■ (2P[W m>m <w]- P[W m _i jm <w]). 
Our goal is to study the distribution of 5 m (\ogX) when X ~ F. We now proceed by 



evaluating each of these terms separately. As in the proof of Lemma |8.2 

W m , m - log U (m) -7 log Ei and 
W m - hm - \ogU(m) -7log(£?i + E 2 ), 

where the Ei are independent standard exponential random variables. 



We can use the Potter bounds (20) and Lemma 8.4 to show that the sequences W m , m — 



log U(m) and W m -i,m ~ logC/(m) are uniformly integrable. This enables us to find the 
moments of interest from the limiting distributions. First, for all wGR, 

lim E[W mjm - log U (m); W m , m - log U (m) < w] 



OO 

— w 

e T 

— W 

e T 



7 log(x) ■ e x dx 



we " ' — 7r ( 0, e 7 
where T is the partial gamma function. Similarly, 



lim E[W m _i )Tn - logU(m); W m -i, m - log U(m) < w) 

m—>oo 



7 log(x) • xe x dx 



e 1 



w(l + e 7 )e 



-e T 



-7 



-e t 



+ r 0,e 



Finally, 



lim 2P[W m>m - log t/(m) < w] - P[W m _i, m - logC/(m) < w) 



2P 



> e -y 



P 



i?i + £2 > e ^ 



1 — e t ) e 



-e T 
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(24) lim 5 m (w + log U(m)) = 7 

m~ ¥00 

It remains to find the distribution of 

z m : = exp 
when X is drawn from F. Now, 



r 0,e~ - e 



log X — log U (m) 

7 



lim mP[z m < x] — h m m E U^" > U(m)x 7 1 
lim mP [X > U (mx -1 )] 



X, 



for any x > 0. Thus, if fi Zm is the distribution of z m , we find that m ■ [i Wm converges weakly 
to Lebesgue measure on compact intervals of K + . 

Now, by construction, we see that the functions g m {w) = 8 m {w + \ogU{m)) must be 
Lipshitz continuous with constant 1 (since changing W\ by A can change W m+ i^ m — W m ^ m 
by at most A), and so the g m converge uniformly on compact intervals to g, where g(w) = 



f 



and 



f(z) = 7 ■ [r (0, 



1 



is the limiting function from (24). We can then argue by weak convergence of the fx Zm to 
Lebesgue measure and by uniform convergence of \g m — g\ to that: 

_ X 

hm hm mE 

c— ¥00 m— ¥00 



lim lim mE 

c—¥oo m—¥oo 












< 


c 


= lim 








c— ¥00 


1 e~ a h 










< 


c 


= lim 








c— ¥00 


I e~ c h 



j\z) dz 



7 



It now remains to show uniform integrability of mS^- Consider the residuals 

X 



Rr. 



lim mE 



log 



U(m)_ 



> c 



By dominated convergence, if any one of the R c is finite, then lim^oo R c = 0. Thus, the 
R c only have two possible limiting values: or infinity. Now, by Hoeffding's inequality 



Hoeffding, 1948 , we know that 



for all m. Moreover, from Lemma 



mVar[f m ] < Varftff ] 

we know that Var[if| m ^] — > r ) 2 . Thus, 



mE[^(logX)] = mVar[$ m ] < 7 2 

for all m. This implies that the R c are also bounded by 7 2 , and so must converge to zero; 
thus our stated result about variance holds. 
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More generally, for any a > 0, we find that 
lim m Cov[^f m , 



■ am 

f oo 



7 2 



7 2 



POO 

/ (r(0,x) -e~ x ) ■ (r(0,ax) -e~ ax ) dx 
Jo 



e — (a+i)x 

xT(0, x)T(0, ax) 

a + 1 



x=0 



7 2 



a + 1 



□ 



Lemma 8.4. Let X^ m < ... < X m m 6e independent order statistics drawn from a distribution 
F of strictly positive support with extreme value index 7 > 0. Then, for any fixed k and finite 
C, 

lim P [X m - k>m < C] = 0. 

m— ¥00 

Moreover, if there is a constant (3 > such that 

lim Fix) • x~ T) = 0, 

then, for any v > 0, 

(25) lim E [| \og[X m _ k , m ]\ u ■ \{X m _ Km < C}] = 0. 



Proof. As in the proof of Lemma 8.1, we write Xk = U{Y k ). Because U(t) — > 00, the first 
statement follows directly by applying the strong law of large numbers to l{Y k > r} for a 
properly chosen r > 0. To prove the second part, we see that 

e [iog[x m _ fc , OT ];|x ro _fc, m < c] < io g [c]; 

is uniformly bounded, and we already know that P [X m _ ktiri < C] converges to zero. The hard 
part of establishing (25 ) is thus to establish a uniform bound for E [\og[X m -k,m}-\X m -k,m < C\ . 
Now, because lim^o F(x) = 0, 

_i / F(x) \ 13 

lim Fix) ■ x p = lim • aT 1 = 

x->o y ' x^o \1 - F(x) ) 

limy? -Uil + yY 1 =0, 

where we obtain the last equivalence by writing y = ■ 

Without loss of generality, we picked C such that C = U (t ) for some t Q . Because U(t) is 
monotone increasing, we can find a constant L such that 

(t - if ■ Uit)- 1 < L 

for all 1 < t < to. (Without loss of generality, let L — 1.) This implies that 

E [log[X m ^ m } u _\X m _ k , m < C] =E [\og\U{Y m ^ m )Y_\Y m _ Km < t ] 

< /?" • E [log [Y m ^ m -l] v _ \Y m _ k , m < t ] . 
Instead of directly showing that the last term is bounded in m, it suffices to show that 

E[log[n-l]^|>i<*o] <oo. 
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The desired conclusion then follows by stochastic dominance of Y m _k,m over Y\ near Y\ = 1 
for large enough m. 

Meanwhile, this last expression is just 

»min{l,to} (-logy)" , / pMlM dy 

yt y / 



which can be shown by calculus to be finite for any v > 0. 
Proof of Lemma 3.1\ Using our notation from ([5]), 
RBM k>n = s ■ (M(s) - M(s - 1)) 

=s 'C) e_ iog (s {Xa} 



;i + y) 



2 ' 



□ 



{AC[n]:\A\=s} 

n — s + 1 /n 



E 

{BC[n]:[fl[=*-l} 



log ( max{X b } 
1 beB 



E 

{AC[n]:|A|=»} 



• log ( max{X a } 



E log(max{X h }j 

{BcA:\B\=s-l} V 7 

7 {AC[n]:|A|=s} 



since, on the last line, s — 1 times out of s the largest element in B is the same as the largest 
in A, and once in s the largest in B is the second largest in A. To obtain the second-to-last 
line, we used the fact that each set B of size s — 1 is a subset of n — s + 1 distinct sets of 
size s. □ 

Proof of Lemma Without loss of generality, we can assume that t he all have zero 
mean. By the Efron-Stein ANOVA decomposition Efron and Stein, 1981 , for each g( n \ 

there exist j-parameter symmetric functions Gj with j = 1, s(n) such that 

s(n) 

g^(X u ...,X s(n) ) = J2 E G f(^)> 

3=1 {/j6[s(n)]:|/j|=i} 

and the G^ n) (X 7j ) are all mean-zero and pairwise uncorrelated. Using this result, we can 
write our ^/-statistic as 



Un 



n 
s(n) 



-1 s(n) 

E 



n-j 
s(n) - j 



E G ? ( x ->) 

{/j6[n]:|/j|=j} 



Moreover, under this notation, 



Gi B) (X0 =E [^(X^,...,^)^] 
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and 
(26) 



n 
s{n) 



n - 
s(n) 



Thus since the G, are all pairwise uncorrelated and the X, are iid, 



E 



U n - U r 



n 
sin. 



-2 s(n) 

E 

J=2 



< 



< 



s(n)(s(n) 



71 - J 

s(n) - j 

s(n) 



n 



Var 



G 



(n) 



1) 



G) Vai ' 









s(n)(s(n) 



n(n — 11 



E 

Var [gW] 



which implies the stated result, since Var [g^] < G by hypothesis. □ 

Proof of Lemma We already know from Theorem ^3 that the finite dimensional distri- 
butions of the X n {t) converge in law to those of X(t). Thus, to show that X n (t) =>■ X(t) in 
2^[o,6] f° r some < a < b, it suffices by e.g. Theorem 15.6 of Billingsley 1968] to show that 
there is a constant C such that, given any e > 0, there exists a constant N e such that for all 
h, ti € [a, 6] with |ti — i 2 | < £ and for all n > N E , 

(27) E [(X n (t 2 ) — X n (ti)) 2l \ <Ce 2 . 

To show such a bound, it is useful to decompose our expression: 



E[(X n (t 2 )-X n (t!)) 2 ] <E 



X n (t2) — X n (t2 



+ Var 



X n (t 2 ) — X n {tij 



+ E[X n (t 2 )-X n (t 1 )f 



+ E 



X n {ti) — X n {t\ 



where X is a Hajek projection of X as defined in (10). It now remains to bound the terms 
individually. 

By Lemma [3^2 the first and the last summands decay uniformly as 0(l/k(n)) on [a, b], 
and so become eventually negligible for any e > 0. Meanwhile, as in (14), we can write the 
variance term (i.e. the second summand) as 

''''' - ' Jj\.k(n)t 2 J _ 



t\t2k(ri 



Var 



E 



\Xi 



which by Lemma ^3 converges to 

7 2 (*2-ti) 2 
tit 2 (ti + t 2 ) 

on [a, b]; the result can be extended to show that the convergence is uniform over the interval. 
Finally, Lemma 8.2 reduces the problem of showing that E[X n (t)] satisfies the required 



RANDOM BLOCK MAXIMA 
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property to showing that a n (t) = A(2n/tk(n)) satisfies it; this latter task can be performed 



using the Potter bounds. Thus (27) holds. □ 
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